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Abstract —Honeypots are used in IT Security to detect and 
gather information about ongoing intrusions, e.g., by document¬ 
ing the approach of an attacker. Honeypots do so by presenting 
an interactive system that seems just like a valid application 
to an attacker. One of the main design goals of honeypots is 
to stay unnoticed by attackers as long as possible. The longer 
the intruder Interacts with the honeypot, the more valuable 
information about the attack can be collected. Of course, another 
main goal of honeypots is to not open new vulnerabilities that 
attackers can exploit. Thus, it is necessary to harden the honeypot 
and the surrounding environment. This paper presents Apate, 
a Linux Kernel Module (LKM) that is able to log, block and 
manipulate system calls based on preconfigurable conditions 
like Process ID (PID), User Id (UID), and many more. Apate 
can be used to build and harden High Interaction Honeypots. 
Apate can be configured using an integrated high level language. 
Thus, Apate is an important and easy to use building block for 
upcoming High Interaction Honeypots. 

Index Terms —Honeypot; Intrusion Detection; Linux Kernel; 
Rule Engine 

I. Introduction 

Honeypots are well known tools for Intrusion Detection 
and IT Security research. Usually, honeypots fall into one of 
two classes: Low Interaction Honeypots and High Interaction 
Honeypots. A Low Interaction Honeypot simulates attackable 
services, systems, or environments whereas a High Interaction 
Honeypot III El offers a real exploitable service, system, or 
environment. As in most cases a honeypot is not a productive 
system, every activity on a honeypot is either unintended use 
or an attack. 

When deploying a High Interaction Honeypot, it is nec¬ 
essary to harden the honeypot to avoid attackers gaining 
unintended control of the system running the honeypot. A 
High Interaction Honeypot should by definition be exploitable, 
but it should prevent annoying or harmful operations on the 
honeypot system. Another important requirement for High 
Interaction Honeypot is to log as much information as possible 
about the state of the system and about ongoing intrusions. 
Therefore, a High Interaction Honeypot needs a highly flexible 
way to decide which information should be logged and which 
should not. Apate offers such a flexible way, using a high-level 
language for configuration. Also, it should be possible to log 


information on a as fine granular level as possible. Apate offers 
a logging on system call level. Manipulation of system calls, 
depending on user interaction or the system environment, is 
necessary to provide High Interaction Honeypot functionali¬ 
ties. This allows the honeypot provider to present different 
environments depending on PID, UID (and many more), or 
system call parameters. Eor example, the High Interaction 
Honeypot provider is able to present one file structure to PID 
42 and a completely different file structure to PID 43. This 
manipulation can be used to decoy an attacker. Eurthermore, 
it can also be used to suppress harmful actions. The honeypot 
admin is able to prevent execution of system call. Blocking 
a system call can be done by really blocking (not calling the 
real system call), or in a more sophisticated way. At last, it 
is necessary that High Interaction Honeypot components (like 
the proposed LKM) should be hard to detect for intruders. 
This requirement calls for sophisticated technologies, already 
known from rootkits. Eor productive use, it is necessary that a 
High Interaction Honeypot module has only low computational 
overhead. An attacker should not be able to detect a High 
Interaction Honeypot by observing performance leaks. 

Apate is a Linux Kernel module that fulfills all requirements 
mentioned above. Hence, it is an important building block for 
High Interaction Honeypots. 

The rest of this paper is structured as follows; Section |II] 
provides an overview on related work. Section UII] describes 
the design and implementation of Apate in detail. Section |IV] 
shows the evaluation of Apate. Section |V] concludes the paper. 

H. Related Work 

A well known honeypot tool, based on LKM for 2.6 Linux 
Kernel, is Sebek 011. Sebek is primarly used for logging 
purposes in High Interaction Honeypot. Thus, it provides 
several methods for detailed logging (like logging via network 
or GUI). In 00, ways to detect Sebek are described. Sebek 
does not provide the possibility to manipulate system calls, 
hence it does not offer such a fine-grain information logging 
as provided by Apate. 

Another approach for monitoring systems is to use virtual 
machine introspection and system view reconstruction. Eor 


example, Q, ID, and 121 use this approach. Introspection 
realized on hardware level of the virtual machines offers 
a stealthier approach then Apate. However, Apate provides 
additional means to manipulate the behavior of system calls, 
which are not supported by Q, l8], and 121, hence Apate is 
superior to these approaches. 

SELinux Qol is a well known tool for inserting hooks 
at different locations inside the kernel. Such an approach 
provides access control for critical kernel routines. SELinux 
can be controlled on a very hne granular level with an 
embedded configuration language. Although SELinux is very 
useful in hardening a kernel, it is not designed for honeypot 
purposes. Especially, it lacks in the possibility to decoy the 
attacker using “wrong” information. 

Grsecurity im with PAX IT^ is similar to Apate. However, 
it greatly differs in ease of deployment and ease of conhgura- 
tion ini. It also lacks in the possibility to decoy the attacker 
with “wrong” informations. 

In conclusion, non of the mentioned related work fulfills all 
requirements listed in Section|l] Apate fulfills all requirements, 
hence is a useful building block for upcoming High Interaction 
Honey pots. 

III. Design and implementation 

Apate intercepts system calls and allows to execute custom 
code in these calls. Eigure [T] shows the interception strategy 
of Apate. 



Fig. 1. interception strategy of Apate 


Apate does not manipulate the syscall table to prevent 
detection (see Subsection IIII-DI for details). Apate intercepts 
the syscall within the syscall target, i.e., the real syscall address 
is called but Apate jumps immediately to the interception 
routine (after consuming some decoy assembler code). The 
hook decides on the action to invoke, based on the rules for this 
system call. Within this action, it is able to manipulate, block 
and/or log a system call. The following hooks are implemented 
in Apate : 

• sys_open, sys_close, sys_open 

• sys_read, sys_write, sys_unlink 

> sys_execve 

• sys_getpid, sys_getuid 


• sys_mkdir, sys_rmdir 

• sys_getdents 

This paper focuses on the usage of system calls that are 
related to Eile lO and execution control as these system calls 
are usefull for hardening High Interaction Honeypots. 

A. Configuration 

Apate can be conhgured in a very flexible way as can be 
seen in Eigure |2] The configuration file rules.apate, written in 
a high level language (see sectior UlTBl for details), gets com¬ 
piled by the Apate compiler, resulting in the hie apaterules.c. 
Together with the original source code, the compiler generates 
the Apate LKM. The resulting LKM can be loaded into kernel 



Fig. 2. Configuration workflow of Apate 


with common insmod util. Once loaded, the ruleset is active. 

The conhguration consists of rulesets. A ruleset is an 
ordered list of rules. A system call gets intercepted when one 
or more rules match. One system call can have more than one 
matching rule with different decision parameters. There are 
three major types of decision parameters: 

• Parameters that are system call independent like PID, 
UID, SSID (in fact every variable from struct task_struct 
m could be used for conditions). 

• Parameters that are dependent to the specihed system call. 
Often, these are function parameters like paths. 

• Parameters that are dehned by functions. This decision 
parameter allows to build reactive systems. Eor example, 
one can dehne a condition which reads some hie, when¬ 
ever the hie contains a keyword the condition could be 
true. 

Rules are dehned as stated below: 

Let be true = 1 and false = 0. Let c{a, b) be a condition 
such that: 

C : Ax B ^ {0,1} 

( 1 ) 

(a, b) i-A c{a, b) 

, where a € A and b € B wte two parameters, which are used 
by c() for the calculation of the condition. The parameters are 
further called decision parameters. Eor example, a decision 
parameter could be the path of the system call sys_open() 
and the related condition is if (param[0] == “/etc/passwd”) 


























? 1 : 0 where a = param[0] and b = ‘‘‘‘/etc/passwd" . 
This rule matches system calls trying to get access to the file 
^‘fetcjpasswd" 

Let cb{d, e, /) be a condition block. A condition block 
calculates the result of conditions or other conditionblocks 
with AND or OR. A condition block uses two parameters d 
and e and an operator /. d and e can be the result of any 
c{a, b) or another c&(d, e, /). 

CB is the set of all possible condition blocks: 

CB : {0,1} X {0,1} X {AND, OR) {0,1} 

(d, e, /) 1 -^ cb{d, e, /) 

Further, the set of all conditions and condition blocks is AC 
such that 

AC = CUCB (3) 


For cb(d, e, /) let d, e S AC and / G {2,4}. When / = 2 
the operator AND will be used. When f = 4: the operator OR 
will be used. c*{a,b) is a condition that returns always true. 
The second parameter in the conditionblock can be neutralised 
with cb{d, c*(0,0), 2) 

This leads to the definition 


cb{d, e, f) 


1 if (d + e) * / > 4 
0 other 


(4) 


This definition makes it possible to group different condi¬ 
tions and to be aware of precedences. 

Let A be the set of atomic actions. An atomic action is 
a function that provides only one single functionality. For 
example, an atomic action can be the redirection of a system 
call. An action a & A falls in one of three groups: 

• Manipulating actions 

• Logging actions 

• Blocking or emergency exit actions 

Let AS* be an orderd list of actions. The index function i{x) 
assigns an index to each element x G AS, hence 

AS = {x € AS'jO < i{x — 1) < f(a:)} (5) 


Let AAS be the lists of all actions. A rule consists of 
one condition block g G CB and an action set h G ASS. Let 
R be the set of all rules. Whenever the condition block returns 
1, the action set h is started. 

Let RS be a list of sorted rules {RS G R). Each element 
of RS has a flag fl. A flag is defined as 

fl G {exit = 1, -<exit = 0} (6) 

When a system call gets called, all rules in RS are calculated 
beginning with the first rule in RS and until a rule is in state 
true and fl = l. 


Using the definitions above, a highly configurable system 
could be build. Including some basic predefined conditions 
enhances convenience, e.g., equality checks for integer, floats 
or strings. 

B. Configuration High Level Language 

The configuration language implements two main require¬ 
ments: first, the configuration language should be flexible, 
including the ability to reuse patterns, store variables, calculate 
with operators, embed external functions, define functions, and 
use decision statements. This allows to use the language to 
describe even very complex scenarios. Second, the configu¬ 
ration language should provide a transparent way to define 
rules, related to honeypots (or in scope of this paper to control 
and manipulate system calls). To deal with these requirements, 
the Apate language combines concepts known from functional 
programming (in this case Haskell ifTSll I with a concept well 
known from packet filter configuration (in this case pf ifThl l. 

This Section gives a brief introduction to the important parts 
of the language. For the sake of clarity, some convenience 
features of the Apate language ( e.g., embedded C, self defined 
functions, loops) are omitted. 

Listing |3] shows some example source code for the Apate 
language. 

define cl,c2,c3 as condition 
define rl ,r2 as rule 
define al , a2 as action 
define cbl as conditionblock 
define rcl as rulechain 
define syl as syscall 

let cl be testforpname 
let c2 be testforparam 
let c3 be testforuid 
let al be manipulateparam 
let a2 be log 
let syl be sys_open 

let cbl be {(cl (” mysql”) && c2(0;”/var/\ 
lib / mysql /*”))} 

let rl be { cb 1 —>al (0 ; ” / var / lib / my sql/* ” \ 

;” /honey/mysql/”)} 
let 1-2 be {{ c3 (” >”,0)} - >a2 () } 
let rcl be {r2,:rl} // :defines exit 

bind rcl to syl 

Fig. 3. Example Sourcecode Apate language 


The first block with the define statements binds variables 
to different types (like condition, rules, or functions). The code 
block with the let statements points these variables to values 
or functions. In this case, it defines 3 conditions (cl, c2, c3). 


cl will test the actual process name against another string. 
c2 tests if a param of the actual syscall is equal to a given 
value. c3 tests if the actual uid is equal to a given value, 
al, a2 are actions, al manipulates a parameter of the actual 
system call, a 2 logs a system call. The variable cbl represents 
a condition block. Its let assignment also shows that it is 
possible to write nested variable assignements. In this case, the 
conditions cl, c2 are combined with && (AND). In the same 
line, the conditions cl, c2 gets assigned with parameters. In 
this case the condition c 1 checks if the current parent process 
is the mysql-Process. c2 checks if the first parameter (0) 
of the current system call is equal to /varAib/mysql/* . The 
asterisk describes a wildcard function. The rule assignment 
for let rl be. . . binds a conditionblock to an action. In 
this case, it means whenever the conditionblock returns true 
the action al rewrites the first param of the current system 
call. It replaces /var/lib/mysql with /honey/mysql . The rule 
r2 logs the current system call whenever the current UID 
is greater than 0. A ruleset (rulechain) rcl is assigned with 
r2, rl. The r 1 rule is also assigned as exit rule (. . : rl. .). 
When this rule fires, no further rules will be called. In the last 
line, the rule chain rcl is bound to the system call sys_open. 

In conclusion, when the system call sys_open gets called, 
the parent process is the mysql process and the system call 
parameter (in this case the path which should be opened) 
begins with /var/lib/mysql/*, this syscall gets manipulated and 
the syscall will open a file under /honey/mysql/.. . The second 
rule means that every call for sys_open will be logged, except 
when the root user calls this system call. 

C. Manipulation of System Calls 

If a rule matches, the corresponding action chain gets called 
to manipulate the original system call. An action chain has a 
length I with 1 < I < n. 

Figure |4] shows an example for the manipulation strategy. 
Functions prefixed with/_ are actions. 



Fig. 4. Conceptual Manipulation Strategy 


The dispatcher represents the rule engine, deciding which 
action chain should be used. In this example the first action 
logs the system call. The next action manipulates some pa¬ 
rameter like a path or anything else. The f_call_orig calls 
the original system call with the manipulated parameter. The 
result gets returned to the callee. 


Technically, one action is a function that consumes all 
system call parameters, including the current struct task_struct 
and a pointer to the syscall_result variable. Each function 
returns an Integer, indicating wether the function call has been 
successful or not. Whenever a function returns an error, the 
action chain gets disrupted and an error routine is called. 
Finally, the hook returns the syscall_result. In case of an error 
the system call returns a system call dependent error. 

D. Hiding Hooks and LKM 

An attacker should not be able to detect Apate, otherwise 
Apate would not be suitable for High Interaction Honeypots. 
As hiding software in all use cases is very difficult, Apate 
must at least hide itself until the effort of detection of Apate 
is unreasonable high for an intruder. This requires to define 
which effort is unreasonable high for an attacker and which 
is not. The following actions are defined as reasonable for an 
attacker, hence should be prevented; 

• Testing for module presence with standard utils like 
lsmod,modinfo,... or misleading errors when using in- 
smod and similar tools 

• Testing for presence of module in /proc/module and 
/sys/module 

• Testing for presence of Apate related logfiles, configura¬ 
tions, and other artifacts 

To hide Apate, it is necessary to remove the module from 
the module list. Simplified, all modules are represented in a 
global linked list. By using 

list_del_init(&_this_module.list); 

the module is removed and therefore invisible. To hide from 
the /sys/module Apate uses 

kobject_del(&THIS_MODULE->mkobj.kobj); 

to remove itself from this representation. With these modi¬ 
fication, the module is invisible to standard utils (they use 
/proc/module ) and in /sys/module . These technologies are 
also well known rootkit technologies see for example lfT8llfT9l . 

Apate does not use any configuration files beyond the con¬ 
figured rules. The high level language should be deleted by the 
honeypot admin after its compilation into Apate. Hence, Apate 
cannot be identified by an attacker looking for configuration 
files. 

Apate is used to cloak logfiles: predefined rules in Apate 
prevent all users to see, read, or write Apate logfiles. To gain 
access to the logfile, a system administrator need to restart the 
host system without the honeypot. 

To detect a hook, an intruder needs to analyze physical 
memory. Apate makes it hard to load a new module into the 
kernel. It prevents to load another kernel module by overriding 
the flag that controls the module loading ability. Beyond the 
possibilities of Apate, the honeypot admin can harden the host 











system to ensure that this dumping has a high effort for an 
intruder. 

Apate has different opportunities to insert hooks into system 
calls. By default, Apate changes the function pointer in the sys¬ 
tem call table. This is sufficient as long as the intruder has no 
possibility to compare the original table with the hooked table. 
If this is not enough protection, the admin can decide to harden 
the system with some anti-rootkit technologies. This makes it 
impossible for Apate to overwrite the jump points. Figure |5] 
shows the alternative hooking technology. This technology is 



Fig. 5. Hooking using a so-called trampoline 


well known from Windows and Linux rootkits. During the 
hooking process, Apate stores the first n bytes of the target 
system call function. The stored commands will be copied 
to a trampoline function. Instead of the original commands, 
Apate injects a jump operation. This lets the process jump into 
the hooking function immediately after entering the original 
system call function. Whenever the hooking function calls 
the original system call, it calls the trampoline function. The 
original code is executed, then the trampoline function lets 
the process jump into the original function with an offset of 
n bytes. The trampoline is a feature to obfuscate the hook 
for rootkit detection tools and uses live patching technologies. 
Thus, it can be detected with core dump disassembling. This 
is out of scope of this paper as it is assumed that the effort to 
detect the honeypot with disassembling tools is too high. 

IV. Evaluation 

There are three major goals for Apate. The first goal 
is to provide a highly flexible configuration. Although the 
proposed configuration system works well and is suitable for 
High Interaction Honeypots, it must be ensured that every 
possible combination of rulesets and actions can be described. 
This means that the configuration language must be turing 
complete. For this, it is determined that an action a is able 
to decide which rule from the ruleset should be invoked next. 
This means it can jump to any other rule from a given ruleset. 
It is also determined that Apate has an array (in this case an 
impossible array with inhnite indices which can hold any other 
type (like actions, rules, other defined variables or anything 


else)). Technically, Apate has a register and a stack. Last, it is 
determined that an action is able to fill or read any index of this 
array. Together with actions for calculations and conditions 
for jump decisions, the system is turing complete. In fact any 
action is just a C-Function and the define statement creates 
variables. 


The second goal is to provide a system which achieves a 
suitable level of stealthiness. As described in Subsection llll-DI 
the system hides itself from common util tools like Ismod, 
modinfo, modprobe, insmod . Apate is also not available in 
/proc/module or /sys/module . To test for presence in any 
logfile a simple grep command with typical signatures for 
Apate (Simplified each log entry or configuration includes the 
string “apate”, thus it is easy to detect it) is bred on the full 
system. However with proper rules these log entries are not 
visible by standard system commands. 

The third goal is that Apate should be efficient. Performance 
tests should assure that Apate is able to serve under productive 
usage scenarios. The most important performance factor is the 
overhead of logging. To evaluate the performance of Apate in a 
productive scenario, the execution time of sys_open , sys_write 
,sys_read , and sys_close are measured. The sys_open and 
sys_close get called just once a file is opened or closed. The 
sys_write and sys_read get called more often (under the 
condition that heavy writing will be done on the system). Thus, 
the test pattern concentrates on sys_write and sys_read . For 
the performance evaluation, data is copied from one file to 
another using increasing file lengths. This will be done for 
100 times for each file size. The source file is generated on 
the fly from /dev/random before each copy command. After 
each successful copy command the target file is deleted. A 
Gentoo 64 Bit system with 32 GB Ram and 16 Cores is used 
for all performance tests. The kernel is optimized by disabling 
unnecessary drivers and by enabling some debugging flags. 
One source file is generated for each size with random bits 
and a length of l{file) bytes. Let the size of the file be: 


and 


Inifile) = < 


0 < Z < 1,000, 

000,000 (7) 

■n-l + 1 

if In < 1,000,000 

■n-1 + 1, 000 

if 1,000,000 < In 


Ain < 100,000,000 

:„_2 + 1,000,000 

if 100,000,000 < Z„ 


Mn < 1,000,000,000 


( 8 ) 


Four different settings were tested. 

The first setting (toi) is used as reference setting. It does 
not use any interception. 

The second setting, m 2 , uses only one rule which always 
returns true. The related action set calls the origin system call 
and logs this action. This is the shortest way in Apate to 
provide logging functionality. This testing is used to evaluate 
the logging overhead of Apate. 






























TABLE I. Performance Measurement 


Measurement 

mi 

m .2 

m 3 

m4 

Measurements 

110,800 

110,800 

110,800 

110,800 

Unique Filesizes 

1,108 

1,108 

1,108 

1,108 

sd(runtime sec) 

0.1066 

0.2421 


0.2452 

var(runtime sec) 

0.0114 

0.0586 


0.0601 

iqr(runtime sec) 

0.0010 

0.0026 


0.0023 


Median of each size / runtime 


C 

D 

a 

c 



Size in KB 


Fig. 6 . Relation between runtime/filesize/rules sys_open 


The third and fourth setting, m 3 and rm, evaluate the 
influence of rules. Each rule consists of 50 conditions with 
{co, Cl,..., C 50 } where each condition is combined with an 
and statement. The last condition returns false. Each test 
uses 50 rules. The last condition in rule number 50 (last rule) 
returns true. Overall, each system call passes 2500 conditions. 
This triggers an action set that will call the original system 
call (m 3 and 7114 ) and then logs this action (only 7714 ). 

Table H] shows the results of the performance evaluation. The 
sd -row shows the standard deviation, var shows the variance, 
and Iqr shows the interquartile range. 

Eigure| 6 ]shows the correlation between file size and runtime. 
Eor every curve the median of the measured runtimes for each 
unique file size is connected with a line. The mi curve shows 
the reference setting. The m 2 curve shows that the logging 
component has a big influence on performance. Each syscall 
and its values were logged. Each log was sent to another 
server using UDP. Gentoo uses a buffer with 65,365 Byte. To 
copy a file with one Gigabyte it needs 32,720 syscalls. This 
explains the overhead of m 2 and m 4 . The m 3 curve shows that 
the rule engine works with just a small overhead when only 
conditions get processed. Eor one measurement with a file size 
of one Gigabyte, the engine processed 81,800,000 conditions. 
However, to copy a file with less than 65,365 Byte only 4 


syscalls are passed and therefore only 10,000 conditions gets 
processed. 

In conclusion, these measurements shows that it is possible 
to build a syscall interception framework which is able to 
provide proper configuration with acceptable overhead. The 
evaluation does not show a single case that prevents a produc¬ 
tive usage of Apate. 


V. Conclusion 

This paper presented Apate, a Linux Kernel Module for 
hardening High Interaction Honeypots. Apate works on a 
system call level, is able to log, block and manipulate these 
calls, and uses an easy to use yet powerful configuration 
language. The evaluation shows that Apate has a moderate 
performance overhead and can be used in productive honeypot 
systems. Apate is also stealthy enough for most common usage 
scenarios. Overall, Apate is an ideal basis and important build¬ 
ing block for upcoming High Interaction Honeypot Systems. 
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